巴西专利BR112019019653A2 method for controlling the mechanisms of actuation of a plurality of components of a combine

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
a harvester includes a number of components for harvesting plants as the harvester traverses a plantation. the components take actions to harvest plants or facilitate their harvest. the harvester includes a number of sensors to measure the condition of the harvester as the harvester harvests plants. the harvester includes a control system to generate actions for the components to harvest plants in the plantation. the control system includes an agent that executes a model that works to improve the harvester's performance in his crop of plants. performance improvement can be measured by the combine's sensors. the model is an artificial neural network that receives measurements as inputs and generates actions that improve performance as outputs. the artificial neural network is trained using critical reinforcement learning techniques.
公开号:BR112019019653A2
申请号:R112019019653
申请日:2018-03-21
公开日:2020-04-22
发明作者:Ehn Erik；Michael Flemming James；Kamp Redden Lee；Yu Wentao
申请人:Blue River Tech Inc；
IPC主号:

专利说明:

METHOD FOR CONTROLLING THE MECHANISMS FOR THE ACTIVITY OF A HARVEST COMPONENTS PLURALITY Domain of the Invention [0001] This order refers to a system for controlling a harvester in a plantation, specifically controlling the harvester using reinforcement learning methods.
Description of the State of the Art [0002] Traditionally, harvesters are manually operated vehicles, in which the machine includes manual or digital inputs, allowing the operator to control the various configurations of the combine. More recently, machine optimization programs have been introduced that aim to reduce the need for the operator to enter entries. However, these algorithms still do not respond to a wide variety of machine and field conditions and therefore still require a significant amount of input from the operator. On some machines, the operator determines which machine performance parameter is unsatisfactory (less than ideal or unacceptable) and then manually goes through a machine optimization program using various control techniques. This process takes considerable time and requires significant interactions and knowledge from the operator. In addition, the process prevents the operator from monitoring field operations and paying attention to their surroundings while interacting with the machine. Thus, a harvester that improves or maintains its performance with fewer interactions and distractions for the operator is desirable.
Summary description
Petition 870190099635, of 10/04/2019, p. 6/85
2/69 [0003] A harvester can have any number of components to harvest plants as the harvester traverses a plantation. A component, or a combination of components, can perform an action to harvest plants in the field or an action that facilitates harvesting by harvesters in the field. Each component is coupled to an actuator that drives the component to perform an action. Each actuator is controlled by an input controller that is communicatively coupled to a control system for the combine. The control system sends actions, such as machine commands, to the input controllers, which causes the actuators to activate their components. Thus, the control system generates actions that lead the components of the harvester to harvest plants in the plantation.
[0004] The combine can also include any number of sensors to measure the condition of the combine. The sensors are communicatively coupled to the control system. A status measurement generates data that represents a combine configuration or capacity. A combine setting is the current setting, speed, separation, position, etc. of a machine component. The machine's capacity is the result of a component action as the combine harvests plants on the plantation. Thus, the control system receives measurements on the condition of the harvester as the harvester harvests plants in the field.
[0005] The control system can include an agent that generates actions for the components of the harvester and that improves its performance. Improved performance may include quantifying various plant harvest metrics using the harvester, including the number of plants
Petition 870190099635, of 10/04/2019, p. 7/85
3/69 harvested, quality of plants harvested, productivity etc. Performance can be measured using any of the combine's sensors.
[0006] The agent can include a model that receives measurements from the combine as inputs and generates actions planned to improve performance as outputs. In one example, the model is an artificial neural network (RNA), including a number of input neural units in an input layer and a number of output neural units in an output layer. Each neural unit in the input layer is connected by a weighted connection to any number of neural units in the output layer. The neural units and weighted connections in the ANN represent the function of generating an action to improve the harvester performance from a measurement. The weighted connections in the ANN are trained using an actor-critical reinforcement learning model.
Brief Description of the Drawings [0007] Figures IA and 1B are illustrations of a machine for manipulating plants in a field, according to an example. [0008] Figure 2 is an illustration of a combine, including its constituent components and sensors, according to an exemplary embodiment.
[0009] Figures 3A and 3B are illustrations of a system environment for controlling the components of a machine configured to manipulate plants in a field, according to an exemplary embodiment.
[0010] Figure 4 is an illustration of the agent / environment relationship in reinforcement learning systems according to an embodiment.
[0011] Figures 5A to 5E are illustrations of a system of
Petition 870190099635, of 10/04/2019, p. 8/85
4/69 reinforcement learning, according to an embodiment.
[0012] Figure 6 is an illustration of an artificial neural network that can be used to generate actions in order to manipulate plants and improve the performance of the machine, according to an exemplary embodiment.
[0013] Figure 7 is a flow chart illustrating a method to generate actions that improve the harvester performance using an agent that exegates 340 a 342 model that includes an artificial neural network trained using an actor-critical method, according to a form exemplary achievement.
[0014] Figure 8 is an illustration of a computer that can be used to control the machine in order to manipulate plants in the plantation, according to an exemplary embodiment.
[0015] The figures present embodiments for illustration purposes only. Those skilled in the art will immediately recognize, upon reading the following description, what alternative embodiments of the structures and methods illustrated herein can be employed without abandoning the principles of the invention described herein.
Detailed Description
I. Introduction [0016] Agricultural machines that affect (manipulate) plants in a field continue to improve over time. Agricultural machines can include a multitude of components to perform the task of harvesting plants in a field. They can also include any number of sensors that measure to monitor the performance of a component, a group of components or the state of a component. Traditionally,
Petition 870190099635, of 10/04/2019, p. 9/85
5/69 measurements are reported to the operator and the operator can manually make changes to the configuration of the agricultural machine components to improve performance. However, as the complexity of agricultural machinery increases, it becomes increasingly difficult for an operator to understand how a single change in a component affects the overall performance of the agricultural machine. Likewise, classic optical control models that automatically adjust machine components are not viable because the various processes to perform the machine task are non-linear and highly complex, so the dynamics of the machine system are unknown.
[0017] Here we describe an agricultural machine that employs a machine learning model that automatically determines, in real time, actions to affect the machine's components and improve the machine's performance. In one example, the machine learning model is trained using a reinforcement learning technique. Models trained using reinforcement learning stand out in pattern recognition in large interconnected data structures, being applied here to the measurements of an agricultural machine, without inputs entered by an operator. The model can generate actions for the agricultural machine in order to improve the machine's performance based on recognized standards. Consequently, an agricultural machine is described that executes a trained model using reinforcement learning and that allows the agricultural machine to operate more efficiently, with less information entered by the operator. Among other benefits, this helps to reduce operator fatigue and distraction, for example, in cases where
Petition 870190099635, of 10/04/2019, p. 10/85
6/69 that the operator is also driving the agricultural machine.
II. Plant Handling Machine [0018] Figure 1 is an illustration of a machine for handling plants in a field, according to an exemplary embodiment. While the illustrated machine 100 is similar to a tractor pulling a farm implement, the system can be any type of system for handling plants 102 in a field. For example, the system can be a combine, a thresher, a seeder, a planter, an agricultural sprayer, etc. Plant handling machine 100 may include any number of detection mechanisms 110, handling components 120 (components) and control systems 130. Machine 100 may further include any number of mounting mechanisms 140, verification systems 150, sources power, digital memory, communication devices or any other suitable component.
[0019] Machine 100 operates to manipulate one or more plants 102 within a geographical area 104. In various configurations, machine 100 manipulates plants 102 to regulate growth, harvest part of the plant, treat a plant with a fluid , monitoring the plant, stopping plant growth, removing a plant from the environment or any other type of plant manipulation. Often, machine 100 directly manipulates a single plant 102 with a component 120, but it can also manipulate multiple plants 102, indirectly manipulating one or more plants 102 in the vicinity of machine 100, etc. In addition, machine 100 can manipulate a part of a single plant 102 instead of an entire plant 102. For example, in several
Petition 870190099635, of 10/04/2019, p. 11/85
7/69 modalities, the machine 100 can prune a single leaf of a large plant or it can remove an entire plant from the soil. In other configurations, machine 100 can manipulate the environment of plants 102 with various components 120. For example, machine 100 can remove the soil to plant new plants in geographical area 104, remove unwanted objects from the soil in geographical area 104 etc.
[0020] Plants 102 may be crops, but may alternatively be weeds or any other suitable plant. The crop may be cotton, but alternatively it may be lettuce, soy, rice, carrots, tomatoes, corn, broccoli, cabbage, potatoes, wheat or any other suitable commercial crop. The planting field in which the machine is used is an outdoor planting field, but can alternatively be composed of 102 plants within a greenhouse, a laboratory, a growing house, a set of containers, a machine or any other suitable environment. Plants 102 can be grown in one or more row crops (for example, beds), where the rows of plants are parallel, but can be grown alternatively in a set of plant pots, where the plant pots can be ordered in rows or matrices or randomly distributed or maintained in any other suitable configuration. The rows of plants are generally spaced between 2 inches and 45 inches (for example, as determined from the longitudinal axis of the line), but may alternatively be spaced at any suitable distance or have variable spacing between several lines. In other configurations, plants are not grown in rows.
Petition 870190099635, of 10/04/2019, p. 12/85
8/69 [0021] Plants 102 within each plantation field, online crop or plantation field subdivision generally include the same type of crop (eg, same genus, same species, etc.), but may alternatively include several crops or plants (for example, a first and a second plant), which can be manipulated independently. Each plant 102 may include a stem, disposed superiorly (for example, above) to the substrate, which supports the branches, leaves and fruits of the plant. Each plant 102 can additionally include a root system attached to the stem, located below the substrate plane (for example, below the ground), which supports the plant's position and absorbs nutrients and water from the substrate 106. The plant can be a vascular plant , non-vascular plant, woody plant, herbaceous plant or any suitable type of plant. The plant can have a single stem, several stems or any number of stems. The plant can have a pivoting root system or a fasciculated root system. Substrate 106 is soil, but may alternatively be a sponge or any other suitable substrate. The components 120 of the machine 100 can handle any type of plant 102, any part of the plant 102 or any part of the substrate 106 independently.
[0022] Machine 100 includes several detection mechanisms 110 configured to obtain images of plants 102 in the field. In some configurations, each detection mechanism 110 is
configured to create a picture of an only line in 102 plants but can generate Images of any number in plants in geographical area 104. The mechanisms detection 110 seek to identify plants individual 102 or parts in
Petition 870190099635, of 10/04/2019, p. 13/85
9/69 plants 102 as machine 100 traverses geographical area 104. Detection mechanism 110 can also identify elements of the environment around plants 102 from elements in geographical area 104. Detection mechanism 110 can be used to control any of the components 120, so that a component 120 manipulates an identified plant, part of a plant or element of the environment. In various configurations, the detection system 110 can include any number of sensors that can take a measurement to identify a plant. The sensors can include a multispectral camera, a stereo camera, a CCD camera, a single lens camera, a hyperspectral imaging system, a LIDAR (Light Detection And Ranging System) system, dynamometer, infrared camera, thermal camera or any other mechanism detection method.
[0023] Each detection mechanism 110 can be coupled to the machine 100 at a distance from a component 120. The detection mechanism 110 can be statically coupled to the machine 100, but can also be coupled in a mobile way (for example, with a support to machine 100. In general, machine 100 includes some detection mechanisms 110 positioned to capture data relating to a plant before component 120 finds the plant, so that the plant can be identified before being manipulated. In some configurations, component 120 and detection mechanism 110 are arranged so that the center lines of the detection mechanism 110 (for example, center line of the field of view of the detection mechanism) and a component 120 are aligned, but can be arranged alternately so that the central lines are displaced. Others
Petition 870190099635, of 10/04/2019, p. 14/85
Detection mechanisms 110 can be arranged so as to observe the operation of one of the components 120 of the device, such as harvested grains passing through a plant storage component or a harvested grain passing through a sorting component.
[0024] A component 120 of machine 100 operates to manipulate plants 102 when machine 100 crosses the geographical area. A component 120 of machine 100 may, alternatively or additionally, function to affect the performance of machine 100, even if it is not configured to manipulate a plant 102. In some examples, component 120 includes an active area 122 that is manipulated by the component 120. The effect of manipulation may include necrosis of the plant, stimulation of plant growth, necrosis or removal of a portion of the plant, stimulation of the growth of a portion of the plant or any other appropriate manipulation. Manipulation may include displacing plant 102 from substrate 106, disrupting plant 102 (for example, cutting it), fertilizing plant 102, watering plant 102, injecting one or more working fluids into the substrate adjacent to plant 102 (for example, example, within a threshold distance from the plant), harvest a portion of plant 102 or manipulate plant 102.
[0025] Generally, each component 120 is controlled by an actuator. Each actuator is configured to position and activate each component 120, so that component 120 handles a plant 102 when instructed in this way. In various configurations, the actuator can position a component so that the active area 122 of component 120 is aligned with a plant to be manipulated. Each actuator is
Petition 870190099635, of 10/04/2019, p. 15/85
11/69 communicative form to an input controller that receives commands from the control system machine 130 instructing component 120 to manipulate a plant 102. Component 120 can be configured to be in a standby mode, in which the component does not manipulate a plant 102 or affects the performance of machine 100, and a handling mode, in which component 120 is controlled by the actuation controller to manipulate the plant or affect the performance of machine 100. However, the component (s) ) 120 can be operated in any other suitable number of operating modes. In addition, an operating mode can have any number of submodes configured to control manipulation of plant 102 or affect machine performance.
[0026] Machine 100 may include a single component 120 or may include multiple components. The various components can be of the same type of component or different types of components. In some configurations, a component can include any number of manipulation subcomponents that together perform the function of a single component 120. For example, a component 120 configured to spray treatment fluid onto a plant 102 can include subcomponents, such as a nozzle , a valve, a collector and a treatment fluid reservoir. The subcomponents work together to spray treatment fluid on a plant 102 in geographic area 104. In another example, a component 120 configured to move a plant 102 towards a storage component can include subcomponents, such as an engine, a conveyor, a container and an elevator. The subcomponents work together to move the plant towards a storage component of the
Petition 870190099635, of 10/04/2019, p. 16/85
12/69 10 máquina machine.
[0027] In a configuration example, machine 100 may additionally include a mounting mechanism 140 that works to provide a mounting point for the various elements of machine 100. In one example, mounting mechanism 140 retains statically and mechanically supports the positions of the detection mechanism (s) 110, component (s) 120 and verification system (s) 150 in relation to a longitudinal axis of the assembly mechanism 140. The assembly mechanism 140 is a chassis or structure, but it can alternatively be any other suitable mounting mechanism. In some configurations, there may be no mounting mechanism 140, or the mounting mechanism may be incorporated into any other component of the machine 100.
[0028] In an example of machine 100, the system may also include a first set of coaxial wheels, with each wheel in the set being arranged along an opposite side of the mounting mechanism 140, and may additionally include a second set of wheels coaxial, where the rotational axis of the second set of wheels is parallel to the rotational axis of the first set of wheels. However, the system can include any suitable number of wheels in any suitable configuration. Machine 100 may also include a coupling mechanism 142, such as a hitch, which functions to attach removably or statically to a drive mechanism, such as a tractor, further to the rear of the drive mechanism (so that machine 100 be dragged behind the drive mechanism), but alternatively in front of the drive mechanism or on the side of the drive mechanism. Alternatively, the
Petition 870190099635, of 10/04/2019, p. 17/85
Machine 100 may include the drive mechanism (for example, a motor and clutch coupled to the first and / or second set of wheels). In other exemplary systems, the system may have other means of crossing the field.
[0029] In some exemplary systems, the detection mechanism 110 can be mounted on the mounting mechanism 140, so that the detection mechanism 110 crosses a geographical location before component 120 crosses the geographical location. In a variation of the machine 100, the detection mechanism 110 is statically mounted on the assembly mechanism 140 next to component 120. In variants that include a verification system 150, the verification system 150 is arranged distally from the verification mechanism detection 110, with component 120 arranged there, so that the verification system 150 crosses the geographical location after the passage of component 120. However, the assembly mechanism 140 can retain the relative positions of the system components in any other configuration proper. In other systems, the detection mechanism 110 can be incorporated into any other component of the machine 100.
[0030] Machine 100 may include a verification system 150 that works to record a measurement of the system, the substrate, the geographical region and / or the plants in the geographical area. The measurements are used to check or determine the state of the system, the state of the environment, the state of the substrate, the geographic region or the extent of plant handling by the machine 100. The verification system 150 can, in some configurations, record the measurements made by the verification system and / or access
Petition 870190099635, of 10/04/2019, p. 18/85
14/69 measurements previously made by the verification system 150. The verification system 150 can be used to empirically determine the results of operation of the component 120 as the machine 100 handles plants 102. In other configurations, the verification system 150 can access measurements sensors and derive additional measurements from the data. In some configurations of the machine 100, the verification system 150 can be included in any other components of the system. The verification system 150 can be substantially similar to the detection mechanism 110 or different from the detection mechanism 110.
[0031] In various configurations, the sensors of a 150 verification system can include a multispectral camera, a stereo camera, a CCD camera, a single lens camera, a hyperspectral imaging system, a LIDAR system (Light Detection and Ranging System ), dynamometer, infrared camera, thermal camera, humidity sensor, light sensor, temperature sensor, speed sensor, rpm sensor, pressure sensor or any other suitable sensor. [0032] In some configurations, machine 100 may additionally include a power source, which works to supply system components, including detection mechanism 100, control system 130 and component 120. The power source can be mounted on the mounting mechanism 140, it can be removably attached to the mounting mechanism 140 or can be detached from the system (for example, located on the drive mechanism). The power source can be a rechargeable power source (for example, a set of rechargeable batteries), a power source that works by capturing (for example, a solar system), a
Petition 870190099635, of 10/04/2019, p. 19/85
15/69 fuel-consuming energy source (for example, a set of fuel cells or an internal combustion system) or any other suitable energy source. In other configurations, the power source can be incorporated into any other component of the machine 100.
[0033] In some configurations, the machine 100 may additionally include a communication device, which functions to communicate (for example, send and / or receive) data between the control system 130, the identification system 110, the verification system 150 and components 120. The communication device can be a Wi-Fi communication system, a cellular communication system, a short-range communication system (for example, Bluetooth, NFC etc.), a wired communication system or any other suitable communication system.
III.Combine harvester [0034] In an exemplary embodiment, machine 100 is a harvester that crosses a geographical area and harvests 102 plants. The harvester components 120 are configured to harvest a part of a plant in the field as the machine 100 travels over plants 102 in geographic area 104. The combine includes several detection mechanisms 110 and verification systems 150 to monitor the agricultural performance of the combine as it traverses the geographical area. Agricultural performance can be quantified by control system 130 using any of the measurements from the various sensors of the machine 100. In various configurations, performance can be based on metrics, including quantity of plants harvested, plant threshing quality, grain cleanliness harvested, the yield of
Petition 870190099635, of 10/04/2019, p. 20/85
16/69 harvester and the loss of plants by the harvester.
[0035] Figure 2 is an example of combine 200, illustrating components 120, verification system 110 and verification system 150 of combine 200, according to an exemplary embodiment. Harvester 200 comprises a chassis 202 which is supported by wheels 204 to be driven over the ground and harvest crops (a plant 102). Wheels 204 may come into direct contact with the ground or walk on rails. A feeder 206 extends from the front of the combine 200. The lift cylinders of the feeder 207 extend between the chassis of the combine 200 and the feeder to raise and lower the feeder (and therefore the agricultural head 208) in relation to the ground. An agricultural head 208 is supported in front of feeder 206. When agricultural harvester 200 operates, it transports feeder 206 along field harvests. The feeder 206 transports the harvest collected by the agricultural head 208 back and forth to the body of the agricultural combine 200.
[0036] Once inside the agricultural harvester 200, the harvest is transported to the separator which comprises a rotor 210 which is cylindrical and a threshing bucket or threshing basket 212. A threshing basket 212 surrounds rotor 210 and is stationary. Rotor 210 is driven in rotation by a controllable internal combustion engine 214. In some configurations, rotor 210 includes a separator fan that includes a series of extensions on the rotor 210 drum that guides the harvest material from the front of rotor 210 to the rear of rotor 210 when rotor 210 rotates. The separating fans are angled in relation to the flow of the
Petition 870190099635, of 10/04/2019, p. 21/85
17/69 harvest on the rotor at a fan angle. The angle of the separating fan is controllable by an actuator. The angle of the fan can affect the quantity and quality of the grain that reaches the threshing basket 212. The threshing basket 112 surrounds rotor 110 and is stationary. The harvest material is transported to the space between the rotor 110 and the threshing basket 112 and is threshed and separated into a grain component and a MOG component (material other than grain). The distance between the rotor 210 and the threshing basket 212 (threshing clearance distance) is controllable by an actuator. The distance from the threshing clearance can affect the quality of the harvested plant. That is, changing the distance of the threshing clearance can change the relative quantities of non-threshed plant, material other than grain and usable grain that are processed by the machine 100.
[0037] The MOG is transported backwards and released between the rotor 210 and the threshing basket 212. It is then received by a re-thresher 216 where the remaining grains are released. The now separated MOG is released behind the vehicle to fall to the ground.
[0038] Most of the grains separated in the separator (and part of the MOG) fall through openings in the threshing basket 212. From there, they fall into a cleaning shoe 218.
[0039] The cleaning shoe 218 has two screens: an upper screen 220 and a lower screen 222. Each screen includes a screen separation that allows grains and MOG to fall and the separation of the screen is controllable by an actuator. The separation of the sieve can affect the quality and type of grains that fall towards the cleaning shoe 218. A fan 224 that is controllable by an actuator is provided
Petition 870190099635, of 10/04/2019, p. 22/85
18/69 in front of the cleaning shoe to blow air back under the sieves. This air passes upwards through the sieves and raises straw, bark, stalk and other small particles of MOG (in addition to a small portion of grains). The air carries this material back to the rear end of the screens. A motor 225 drives fan 224.
[0040] Most of the grains that enter the cleaning shoe 218, however, are not carried backwards, passing down through the upper sieve 220, and then through the lower sieve 222.
[0041] From the material transported by air 224 to the rear of the screens, smaller particles of MOG are blown out of the rear of the combine. Larger MOG particles and grains are not pulled out from the rear of the combine, but fall from the cleaning shoe 218 and onto a shoe loss sensor 221 located on the left side of the cleaning shoe 218 and which is configured to detect shoe loss in the left side of the cleaning shoe 218 and a shoe loss sensor 223 located on the right side of the cleaning shoe 218 and which is configured to detect shoe loss on the right side of the cleaning shoe 218. The shoe loss sensor 223 it can provide a signal that is indicative of the amount of material (which may include mixed grains and MOG) transported to the rear of the cleaning shoe when it falls on the right side of the cleaning shoe 218.
[0042] The heavier material transported to the rear of the upper sieve 220 and the lower sieve 222 falls on a plate and is then transported by gravity downwards on a 227 helical chute. This heavier material
Petition 870190099635, of 10/04/2019, p. 23/85
19/69 is called tailings and is usually a mixture of grains and MOG.
[0043] The grain that passes through the upper sieve 220 and the lower sieve 222 falls on a helical gutter 226. Generally, the upper sieve 220 has a larger sieve separation than the lower sieve 222, so that the upper sieve 220 filters the larger MOG and the lower sieve 222 filters the smaller MOG. Generally, the material that passes through the two sieves has a higher proportion of clean grains compared to MOG. A clean grain auger 228 arranged on the auger chute 226 carries the material to the right side of the agricultural combine 200 and deposits the grain at the lower end of the grain elevator 215. The grain lifted by the grain elevator 215 is transported upward until reaches the top outlet of the grain elevator 215. The grain is then released from the grain elevator 215 and falls into a grain tank 217. Grains entering the grain tank 216 can be measured by several characteristics, including: quantity, mass , volume, cleanliness (quantity of MOG) and quality (quantity of usable grains).
III. Control System Network [0044] The Figures. 3A and 3B are high-level illustrations of a network environment 300, according to an exemplary embodiment. The machine 100 includes a digital network data environment that connects the control system 130, the detection system 110, the components 120 and the verification system 150 through a network 310. Various elements connected in the environment 300 include several controllers of input 320 and sensors 330 to receive and generate data in environment 300. Input controllers 320
Petition 870190099635, of 10/04/2019, p. 24/85
20/69 are configured to receive data through the 310 network (for example, from other sensors 330, such as those associated with the detection system 110) or from its associated sensors 330 and control (for example, act) its associated component 120 or its associated sensors 330. In general terms, sensors 330 are configured to generate data (ie measurements) representing a configuration or capacity of machine 100. A capacity of machine 100, as referred to in this document, is, in general terms, the result of an action of component 120 when machine 100 manipulates plants 102 (performs actions) in a geographical area 104. In addition, a configuration of machine 100, as referred to in this document, is, in general terms, the current speed, position, configuration , actuation level, angle etc. of a component 120 when machine 100 performs actions. A measurement of the configuration and / or capacity of a component 120 or machine 100 can be more generally and as referred to in this document, a measurement of the state of machine 100. That is, several sensors 330 can monitor components 120, a geographical area 104, plants 102, state of machine 100 or any other aspect of machine 100.
[0045] An agent 340 working in the control system 130 inserts the measurements received through the network 330 in a control model 342 as a state vector. The elements of the state vector can include numerical representations of the resources or system states generated from the measurements. Control model 342 generates an action vector for machine 100 predicted by model 342 to improve the performance of machine 100. Each element of the action vector can be a numerical representation of an action that the system can
Petition 870190099635, of 10/04/2019, p. 25/85
21/69 execute to manipulate a plant, manipulate the environment or affect the performance of the machine 100. The control system 130 sends commands from the machine to the input controllers 320 based on the elements of the action vectors. The input controllers receive the commands from the machine and trigger its component 120 to perform an action. Generally, the action leads to an improvement in the performance of the machine 100.
[0046] In some configurations, control system 130 may include an interface 350. Interface 350 allows the user to interact with control system 130 and control various aspects of machine 100. Generally, interface 350 includes an input device and a display device. The input device can be one or more between a keyboard, button, touchscreen, lever, cable, button, dial, potentiometer, variable resistor, axis encoder or other device or combination of devices configured to receive input from a user of the system. The display device can be a CRT, LCD, plasma screen or other display technology or combination of display technologies configured to provide system information to a system user. The interface can be used to control various aspects of agent 340 and model 342.
[0047] The 310 network can be any system capable of communicating data and information between elements within the 300 environment. In various configurations, the 310 network is a wired network, a wireless network or a mixed wired and wireless network. In an exemplary modality, the network is a controlling area network (CAN) and the elements within the 300 environment are
Petition 870190099635, of 10/04/2019, p. 26/85
22/69 communicate with each other via a CAN bus.
III.A Example of Control System Network [0048] Referring again to Figure 3A, Figure 3A illustrates an example of environment mode 300A for a machine 100. In this example, control system 130 is connected to a first component 120A and a second component 120B. The first component 120A includes an input controller 320A, a first sensor 330A and a second sensor 330B. The input controller 320A receives commands from the 310 network machine and drives the 120A component in response. The first sensor 330A generates measurements representing a first state of component 120A and the second sensor 330B generates measurements representing a configuration of the first component 120A when manipulating plants. The second component 120B includes an input controller 320B. Control system 130 is connected to a detection system 110 including a sensor 330C configured to generate measurements to identify plants 102. Finally, control system 130 is connected to a verification system 150 that includes an input controller 320C and a 330D sensor. In this case, the 320C input controller receives commands from the machine that control the position and detection features of the 330D sensor. The 330D sensor is configured to generate data that represents the capacity of the 120B component that affects the performance of the machine 100.
[0049] In various other configurations, the machine 100 can include any number of detection systems 110, components 120, verification systems 150 and / or networks 310. Therefore, environment 300A can be configured in a different way than illustrated in Figure 3A. For example, the
Petition 870190099635, of 10/04/2019, p. 27/85
Environment 300 may include any number of components 120, verification systems 150 and detection systems 110 with each element, including various combinations of input controllers 320 and / or sensors 330.
III.B Combine Control System Network [0050] Figure 3B is a high level illustration of a combine 200 network environment 300B illustrated in Figure 2, according to an exemplary embodiment. In this illustration, to provide clarity, the elements of environment 300B are grouped as input controllers 320 and sensors 330 and not as their constituent elements (component 120, verification system 150 etc.).
[0051] The sensors 330 include a separator loss sensor 219, a shoe loss sensor 221/223, a rotor speed sensor 360, a threshing clearance sensor 362, a grain yield sensor 364, a tailings sensor 366, threshing load sensor 368, grain quality sensor 37 0, straw quality sensor 374, head height sensor 376 and mass flow sensor of feeder 378, but can include any other sensor 330 that can determine a state of the combine 200.
[0052] The separator loss sensor 219 can provide a measurement of the amount of grain that has been transported to the rear of the separator. In one configuration, the separator loss sensor 219 is located at the end of the rotor 210 and in the threshing basket 212. In one configuration, the separator loss sensor can additionally include a threshing loss sensor. The threshing loss sensor can provide a measure of the amount of grain that is lost after
Petition 870190099635, of 10/04/2019, p. 28/85
24/69 threshing. In one configuration, the threshing loss sensor is located next to threshing basket 212.
[0053] Shoe loss sensors 221 and 223 can provide a measurement that represents the amount of material (which may include mixed grains and MOG) transported to the rear of the cleaning shoe and falling down the sides (left and right, respectively) ) of the cleaning shoe 218. The shoe loss sensors are located at the end of the shoe.
[0054] The rotor speed sensor 360 can provide a measurement representing the speed of rotor 210. The faster rotor 210 rotates, the faster it throws the crop. At the same time, when the rotor spins more quickly, it damages a larger proportion of the grains. Thus, by varying the speed of the rotor, the proportion of threshed grains and the proportion of damaged grains can change. In one configuration, the rotor speed sensor 360 can be an axis speed sensor and measure rotor speed 210 directly.
[0055] In another configuration, the rotor speed sensor 360 can be a combination of other sensors that cumulatively provide a measurement that represents rotor speed 210. For example, sensors including a hydraulic fluid flow rate sensor to measure the flow of fluid through a hydraulic motor that drives rotor 210, or an internal combustion engine speed sensor 214 in conjunction with another measurement that indicates a selected gear ratio of a clutch between internal combustion engine 214 and the rotor 210, or an oscillating plate position sensor and axle speed sensor
Petition 870190099635, of 10/04/2019, p. 29/85
25/69 a hydraulic motor that can supply hydraulic fluid to a hydraulic motor that drives rotor 210.
[0056] The threshing clearance sensor 362 can provide a measurement representing a clearance between the rotor 210 and the threshing basket 212. As the clearance is reduced, the plant is threshed more vigorously, reducing the loss of the separator. At the same time, a reduced clearance produces greater damage to the grains. Thus, by changing the threshing clearance, the loss of the separator and the amount of damaged grains can be changed. In another configuration, the threshing distance sensor 362 additionally includes a separating vane sensor. The separating reed sensor can provide a measurement representing the reed angle. The reed can increase or reduce the amount of plant being threshed and, consequently, reduce the loss of the separator. At the same time, the angle of the reed can produce greater damage to the grains. Thus, by changing the reed angle, the loss of the separator and the amount of damaged grains can be changed.
[0057] The grain yield sensor 364 can provide a measurement representing a clean grain flow rate. The grain yield sensor may include an impact sensor located adjacent to an outlet of the grain elevator 215, where the grain enters the grain tank 217. In this configuration, the grain carried upward in the grain elevator 215 affects the grain yield sensor 364 with the force equivalent to the mass flow rate of grains in the grain tank. In another configuration, the grain yield sensor 364 is coupled to a motor (not shown) that drives the grain elevator 215 and can provide a measurement
Petition 870190099635, of 10/04/2019, p. 30/85
26/69 representing the load on the engine. The motor load represents the amount of grain transported upwards by the 215 grain elevator. In another configuration, the motor load can be determined by measuring the current and / or voltage in the motor (in the case of an electric motor). In another configuration, the motor can be a hydraulic motor, and a motor load can be determined by measuring the fluid flow rate in the motor and / or the hydraulic pressure in the motor.
[0058] The tailings sensor 366 and the grain quality sensor 370 can each provide a measurement representing the quality of the beans. The measurement can be one or more of the following: a measurement representing an amount or proportion of usable grains, a measurement representing the amount or proportion of damaged grains (for example, cracked or broken grains), a measurement representing the amount or proportion of MOG mixed with the grains (which can still be characterized as an amount or proportion of different types of MOG, such as light MOG or heavy MOG) and the measurement that represents the amount or proportion of un crushed grain.
[0059] In one configuration, the grain quality sensor 370 is located on a grain flow path between the clean grain auger 228 and the grain tank 217. That is, the grain quality sensor 370 is located in adjacent to the grain elevator 215 and, more particularly, the grain quality sensor 370 is located in order to receive grain samples from the grain elevator 215 and detect sampled grain characteristics from the grain elevator 215. [0060] In a configuration, the 366 tailing sensor is located on a grain flow path between the
Petition 870190099635, of 10/04/2019, p. 31/85
27/69 tailings 229 and the front end of rotor 210, where tailings are released from tail lift 231 and are deposited between rotor 210 and threshing basket 212 for re-threshing. That is, tailings sensor 366 is located adjacent to tailings elevator 231 and, more particularly, tailings sensor 366 is located in order to receive grain samples from tailings elevator 231 and to detect grain characteristics of tailings elevator 231.
[0061] The threshing load sensor 368 can provide a measurement representing the threshing load (ie the load applied to rotor 210). In one configuration, the threshing load sensor 368 comprises a hydraulic pressure sensor arranged to detect the pressure in an engine that drives rotor 210. In another configuration (in the case of a rotor 210 that is driven by a belt and threshing load), the thrust load sensor 368 includes a sensor configured to detect hydraulic pressure applied to a sheave of variable diameter at the rear end of rotor 210 and by which rotor 210 is coupled and driven by a drive belt. In another configuration, the threshing load sensor 368 may include a torque sensor configured to detect torque on the shaft that drives rotor 210.
[0062] In one configuration, the tailing sensor 366 and the grain quality sensor 370 each include a digital camera configured to capture an image of a grain sample. In this case, the control system 130 or the waste sensor 366 can be configured to interpret the captured image and determine the quality of the grain sample.
Petition 870190099635, of 10/04/2019, p. 32/85
28/69 [0063] The straw quality sensor 374 can provide at least one measurement that represents the quality of the straw (for example, MOG) coming out of the combine 200. Straw quality represents a physical characteristic (or characteristics) of the straw and / or windrows that accumulate behind the combine 200. In certain regions of the world, straw, usually harvested on windrows, is later collected and sold or used. The dimensions (length, width and height) of the straw and / or windrow can be a factor in determining its value. For example, short straw is particularly valuable for use as animal feed. Long straw is particularly valuable for use as an animal bed. The long straw allows the formation of high, open and airy windrows. These windrows dry more quickly in the field and (due to their height above the ground) are raised by balers with less drag and other soil contaminants.
[00 64] In one configuration, the straw quality sensor 374 comprises a camera aimed at the rear of the harvester to take a picture of the straw when it leaves the harvester and is suspended in the air to then fall to the ground, or to take a photo of the windmill created by the falling straw. In this configuration, the straw quality sensor 374 or the control system 130 can be configured to access or receive the image from the camera, process it and characterize the length of the straw or characterize the dimensions of the windrow created by the straw in the soil behind the harvester 200. In another configuration, the straw quality sensor 374 comprises a strip detector, such as a laser scanner or ultrasonic sensor directed to the straw that can determine the
Petition 870190099635, of 10/04/2019, p. 33/85
29/69 dimensions of straw and / or windrows.
[0065] The head height sensor 376 can provide a measurement representing the height of the combine head 208 in relation to the ground. In one configuration, the head height sensor 376 comprises a rotating sensor element, such as an axis encoder, potentiometer or a variable resistor to which an elongated arm is coupled. The remote end of the arm creeps on the ground and, as the combine head 208 changes in height, the arm changes angle and rotates the rotating sensor element. In another configuration, the head height sensor 376 comprises an ultrasonic or laser rangefinder.
[0066] The mass flow sensor of the feeder 378 can provide a measurement that represents the thickness of the material attracted to the feeder and to the agricultural harvester 200 itself. Generally, there is a correlation between the mass and the yield of the harvest (ie , grain yield). The control system 130 can be configured to calculate grain yield by combining a measurement of the height sensor of the 376 platform and a measurement of the mass flow sensor of the feeder 378 with agronomic tables stored in memory circuits of the control system 130 This configuration can be used in addition to, or alternatively to, a measurement from the grain yield sensor 364 to provide a measurement representing the flow rate of clean grains.
[0067] The combine speed sensor 372 is any combination of sensors that can provide a measurement that represents the speed of the combine in geographic area 104. Speed sensors can include
Petition 870190099635, of 10/04/2019, p. 34/85
30/69 GPS sensors, engine load sensors, accelerometers, gyroscopes, gear sensors or any other sensors or combination of sensors that can determine speed.
[0068] Input controllers 340 include an upper sieve controller 380, a lower sieve controller 382, a rotor speed controller 384, a fan speed controller 386, a vehicle speed controller 388, a speed controller threshing clearance 3 90 and a head height controller 392, but can include any other input controller that can control a component 120, identification system 110 or verification system 150. Each of the input controllers 340 is communicatively coupled to an actuator that can drive its coupled element. Generally, the input controller can receive machine commands from control system 130 and drive a component 120 with the actuator in response.
[0069] The upper sieve controller 380 is coupled to the upper sieve 220 and is configured to change the angle of the individual sieve elements (slats) that comprise the upper sieve 220. When changing the position (angle) of the individual sieve elements, the amount of air passing through the upper sieve 220 can vary to increase or decrease (as desired) the vigor with which the grains are sieved.
[0070] The bottom sieve controller 382 is coupled to the bottom sieve 222 and is configured to change the angle of the individual sieve elements (strips) that comprise the bottom sieve 222. When changing the position (angle) of the
Petition 870190099635, of 10/04/2019, p. 35/85
31/69 individual sieve elements, the amount of air passing through the lower sieve 222 can vary to increase or decrease (as desired) the vigor with which the grains are sieved.
[0071] The 384 rotor speed controller is coupled to variable drive elements located between the internal combustion engine 214 and the rotor 210. These variable drive elements may include clutches, gear sets, hydraulic pumps, hydraulic motors, generators electric motors, electric motors, pulleys with a variable working diameter, belts, shafts, belt drives, IVTs, CVTs and the like (as well as their combinations). The rotor speed controller 384 controls the variable drive elements and is configured to vary the speed of rotor 210.
[0072] The fan speed controller 386 is coupled to variable drive elements arranged between the internal combustion engine 214 and the fan 224 to drive the fan 224. These variable drive elements can include clutches, gear sets, hydraulic pumps , hydraulic motors, electric generators, electric motors, pulleys with variable diameters of transmission belts, belt drives, IVT's, CVTs and the like (as well as all their combinations). The fan speed controller 386 is configured to control the variable drive elements to vary the speed of the fan 224. These variable drive elements are shown symbolically in Figure 1 as motor 225.
[0073] The speed controller of the vehicle 388 is coupled
Petition 870190099635, of 10/04/2019, p. 36/85
32/69 to variable drive elements located between the internal combustion engine 214 and one or more of the wheels 204. These variable drive elements may include hydraulic or electric motors coupled to the wheels 204 to drive the rotation of the wheels 204. The Vehicle speed 388 is configured to control the variable drive elements, which in turn control the speed of the wheels 204 by varying a hydraulic or electrical flow through the engines that drive the rotation of the wheels 204 and / or varying a gear ratio of gear coupled between the motors and the wheels 204. The wheels 204 can rest directly on the floor or they can rest on a belt or recirculation belt that is arranged between the wheels and the ground.
[0074] The threshing clearance controller 390 is coupled to one or more threshing clearance actuators 391, 394 that are coupled to the threshing basket 212. The threshing clearance controller is configured to change the clearance between rotor 210 and the threshing basket 212. Alternatively, the threshing clearance actuators 391 are coupled to the threshing basket 212 to change the position of the threshing basket 212 in relation to rotor 210. The actuators may comprise hydraulic or electric motors of the rotary-action varieties or linear action.
[0075] The head height controller 392 is coupled to valves (not shown) that control the flow of hydraulic fluid to and from the feeder 207 lift cylinders. The head height controller 392 is configured to control the feeder by lifting and selectively lowering the feeder and, consequently, the head
Petition 870190099635, of 10/04/2019, p. 37/85
33/69 of combine 208.
IV. Control System Agent [0076] As described above, control system 130 runs an agent 340 that can control the various components 120 of machine 100 in real time and acts to improve the performance of that machine 100. Generally, the agent 340 is any program or method that can receive measurements from 340 sensors on machine 100 and generate machine commands for input controllers 330 coupled to components 120 of machine 100. The generated machine commands cause input controllers 330 to trigger the components 120 and change their state and, consequently, change their performance. The changed state of the components 120 improves the overall performance of the machine 100.
[0077] In one embodiment, agent 340 executed by control system 130 can be described as executing the following function:
a = F (s) (4.1) where s is an input state vector, a is an output action vector and function F is a machine learning model that works to generate output action vectors that improve machine performance 100, input state vectors data.
[0078] Generally, the input state vector s is a representation of the measurements received from sensors 320 of the machine 100. In some cases, the elements of the input state vector s are the measurements themselves, while in other cases, the system control 130 determines an input state vector s of M measurements using an input function I, such as:
s = I (m) (4.2)
Petition 870190099635, of 10/04/2019, p. 38/85
34/69 where input function I can be any function that can convert measurements from machine 100 into elements of an input function I. In some cases, the input function can calculate differences between an input state vector and a previous input state vector (for example, in a previous step). In other cases, the input function can manipulate the input state vector in a way that is compatible with the F function (for example, removing errors, ensuring that the elements are within limits, etc.). [0079] In addition, output action vector a is a representation of machine c commands that can be transmitted to input controllers 320 of machine 100. In some cases, elements of output action vector a are commands from machine, while in other cases, control system 130 determines machine commands from the output action vector a using an output function O:
c = 0 (a) (4.3) where the output function O can be any function that can convert the output action vector into machine commands for input controllers 320. In some examples, the output function can work to ensure that the generated machine commands are within the tolerances of their respective components 120 (for example, it does not rotate too fast, does not open too much etc.).
[0080] In several other configurations, the machine learning model can use any function or method to model the unknown dynamics of the machine 100. In this case, agent 340 can use a dynamic model 342 to dynamically generate machine commands to control the machine 100 and improve the performance of machine 100. In several
Petition 870190099635, of 10/04/2019, p. 39/85
35/69 configurations, the model can be composed of: function approximators, dynamic probabilistic models such as Gaussian processes, neural networks or any other similar model. In various configurations, agent 340 and model 342 can be trained using: Q learning methods, state-action-state reward methods, deep Q network methods, actor-critical methods, or any other training method of a agent 340 and model 342, so that agent 340 can control machine 100 based on model 442.
[0081] In the example where machine 100 is a combine 200, performance can be represented by any part of a set of metrics, including one or more of the following: a measurement of the quantity of plant harvested, quality of threshing of the plant, cleaning of harvested grain, harvester yield and harvester plant loss. The quantity of harvested plant can be the quantity of grain entering the grain tank 217, the threshing quality can be the quantity, quality or loss of the plant after threshing in the threshing basket 212, the cleaning of the harvested grain can be the quality of the plant entering the grain tank, the harvester production can be the amount of grain entering the grain tank 217 over a period of time and the grain loss can be the amount of grain lost at various stages of harvesting . As previously described, performance can be determined by control system 130 using measurements from any of the combine's 330 sensors. Therefore, improving the performance of machine 100 may, in specific embodiments of the invention, include improving any or more of these metrics, as
Petition 870190099635, of 10/04/2019, p. 40/85
36/69 determined by receiving improved measurements from machine 100 in relation to any one or more of these metrics.
V Reinforcement Learning [0082] In one embodiment, agent 340 can execute a 342 model including deterministic methods that have been trained with reinforcement learning (thus creating a reinforcement learning model). Model 342 is trained to increase the performance of machine 100 using measurements from sensors 330 as inputs and machine commands to input controllers 320 as outputs.
[0083] Reinforcement learning is a machine learning system in which a machine learns 'what to do' - how to map situations to actions - to maximize a numerical reward signal. The student (for example, machine 100) is not informed about the actions to be performed (for example, generation of machine commands for input controllers 320 of components 120), but instead finds out which actions generate more reward (for example, increasing the quality of harvested grains) by experimenting with them. In some cases, actions can affect not only the immediate reward, but also the next situation and, with that, all subsequent rewards. These two characteristics search by trial and error and delayed reward - are two distinctive characteristics of reinforcement learning.
[0084] Reinforcement learning is not defined by the characterization of learning methods, but by the characterization of a learning problem. Basically, a reinforcement learning system captures the important aspects of the problem faced by an
Petition 870190099635, of 10/04/2019, p. 41/85
37/69 learning by interacting with your environment to achieve a goal. That is, in the example of a harvester, the reinforcement learning system captures the dynamics of the harvester 200 system as it picks plants in a field. This agent detects the state of the environment and performs actions that affect the state to achieve one or more goals. In its most basic form, the formulation of reinforcement learning includes three aspects for the student: sensation, action and goal. Continuing with the example of the harvester 200, the harvester 200 senses the state of the environment with sensors, adopts actions in that environment with machine controls and achieves a goal that is a measurement of the harvester's performance when harvesting grain crops.
[0085] One of the challenges that arise in reinforcement learning is the exchange between research and exploration. To increase the reward in the system, a reinforcement learning agent prefers actions he tried in the past that are effective in producing rewards. However, to discover actions that produce rewards, the learning agent selects actions that he had not selected before. The agent 'explores' the information he already knows to get a reward, but also 'investigates' the information to make better action selections in the future. The learning agent experiences a variety of actions and progressively favors those that appear to be better, while trying new actions. In a stochastic task, each action is usually tried several times to obtain a reliable estimate of the expected reward. For example, if the harvester is running an agent that knows that a specific harvester speed leads to a good
Petition 870190099635, of 10/04/2019, p. 42/85
38/69 system performance, the agent can change the speed of the combine with a machine command to see if the change in speed influences system performance.
[0086] In addition, reinforcement learning considers the whole problem of a goal-oriented agent interacting with an uncertain environment. Reinforcement learning agents have explicit goals, can detect aspects of their environments and choose actions to receive great rewards (that is, to increase system performance). In addition, agents generally operate despite significant uncertainty about the environment they face. When reinforcement learning involves planning, the system addresses the interaction between planning and selecting actions in real time, as well as the question of how environmental elements are acquired and improved. For reinforcement learning to lead to progress, important sub-problems must be isolated and studied, and such sub-problems have clear functions in complete, interactive and goal-oriented agents.
V. The Agent-Environment Interface [0087] The reinforcement learning problem is a structuring of a machine learning problem in which interactions are processed and actions are performed to achieve a goal. The student and the decision maker are called an agent (for example, agent 340 of the reaper 200). The thing with which it interacts, comprising everything external to the agent, is called the environment (for example, environment 300, plants 102, geographical area 104, dynamics of the harvester process, etc.). These two elements interact continuously, with the agent selecting actions (for example, machine commands for input controllers
Petition 870190099635, of 10/04/2019, p. 43/85
39/69
320) and the environment responding to these actions and presenting new situations to the agent. The environment also generates rewards, special numerical values that the agent tries to maximize over time. In a context, rewards work to maximize system performance over time. A complete specification of an environment defines a task that is an instance of the reinforcement learning problem.
[0088] Figure 4 presents a diagram of the agent-environment interaction. More specifically, the agent (for example, agent 340 from the combine 200) and the environment interact in a sequence of discrete time intervals, that is, t = 0, 1, 2, 3, etc. At each time interval, the agent receives some representation of the state of the environment s _t (for example, measurements from the sensor representing a state of the machine 100). The states s _t are within S, and S is the set of possible states. Based on state s _t and time interval t, the agent selects an action at (for example, a set of machine commands to change a component configuration 120). The at action is in A (s _t ), where A (s _t ) is the set of possible actions. A chronological state later, partly as a result of his action, the agent receives a numerical reward r _t + i. The states r _t + i are in R, where R is the set of possible rewards. When the agent receives the reward, the agent selects a new state s _t + i · [0089] At each chronological step, the agent implements a state mapping for the probabilities of selecting each possible action. This mapping is called agent policy and is called n _t where n _t (s, a) is the probability that a _t = a if s _t = s. Reinforcement learning methods
Petition 870190099635, of 10/04/2019, p. 44/85
40/69 can dictate how the agent changes its policy as a result of the states and rewards resulting from the agent's actions. The agent's goal is to maximize the total amount of rewards he receives over time.
[0090] This reinforcement learning structure is flexible and can be applied to several different problems in several different ways (for example, agricultural machines operating in a field). The structure proposes that whatever the details of the sensory, memory and control apparatus, any problem (or objective) of learning goal-oriented behavior can be reduced to three signals that pass between an agent and his environment: a signal for represent the choices made by the agent (the actions), a sign to represent the basis on which the choices are made (the states) and a sign to define the agent's goal (the rewards).
[0091] Continuing, the time intervals between actions and state measurements need not refer to fixed intervals of real time; they can refer to arbitrary successive stages of decision making and action. Actions can be low-level controls, such as the stresses applied to a combine's engines, or high-level decisions, such as whether or not to plant a seed with a planter. Likewise, states can take a wide variety of forms. They can be completely determined by low-level sensations, such as direct sensor readings, or they can be of a higher level, such as symbolic descriptions of soil quality. The states can be based on previous sensations or even be subjective. Likewise, actions can be based on previous actions, policies or can
Petition 870190099635, of 10/04/2019, p. 45/85
41/69 be subjective. In general, actions can be any decisions the agent learns to make in order to obtain a reward, and states can be anything the agent can know that can be useful in selecting those actions.
[0092] Furthermore, the limit between the agent and the environment is generally not just physical. For example, certain aspects of agricultural machinery, for example, sensors 330 or the field in which it operates, can be considered part of the environment and not part of the agent. Generally, anything that cannot be changed by the agent, at its discretion, is considered external to the agent and part of the environment. The agenteambiente limit represents the limit of the agent's absolute control, not the agent's knowledge. As an example, the size of an agricultural machine tire can be part of the environment, as it cannot be changed by the agent, but the angle of rotation of an axle on which the tire resides can be part of the agent, as it is changeable, being in this case, controllable by the transmission of the machine. In addition, the soil moisture in which the agricultural machine operates can be part of the environment, especially if it is measured before an agricultural machine passes over it; however, soil moisture can also be part of the agent if the agricultural machine is configured to measure moisture after passing over that part of the soil and after applying water or other liquid to the soil. Likewise, rewards are computed within the physical entity of the agricultural machine and the artificial learning system, but are considered to be external to the agent.
[0093] The agent-environment limit can be located in different locations for different purposes. On a machine
Petition 870190099635, of 10/04/2019, p. 46/85
42/69 agricultural sector, many different agents can operate at the same time, each with its own limit. For example, an agent can make high-level decisions (for example, increase the depth of seed planting) that are part of the states experienced by a lower-level agent (for example, the agent that controls air pressure in the seeder) that implements high-level decisions. In practice, the agent-environment limit can be determined based on states, actions and rewards and can be associated with a specific decision-making task of interest.
[0094] Specific states and actions vary widely between applications, and the way in which they are represented can strongly affect the performance of the reinforcement learning system implemented.
VI Reinforcement Learning Methods [0095] In this section, a variety of methodologies used for reinforcement learning are described. Any aspect of any of these methodologies can be applied to a reinforcement learning system within an agricultural machine operating in a field. Generally, the agent is the machine that operates in the field and the environment are elements of the machine and the field that are not under direct control of the machine. States are measurements of the environment and how the machine is interacting with it, actions are decisions and actions taken by the agent to affect states and results are a numerical representation for improvements (or decreases) of states.
SAW. A Action Value and State Value Functions [0096] Reinforcement learning models can be based on the estimation of state value functions or
Petition 870190099635, of 10/04/2019, p. 47/85
43/69 share value. These state functions, or action and state pairs, estimate the value of the agent in a given state (or how valuable it is to perform a given action in a given state). The idea of value is defined in terms of future rewards that can be expected by the agent or in terms of the agent's expected return. The rewards that the agent can expect to receive in the future depend on what actions will be taken. Consequently, value functions are defined with respect to specific policies.
[00 97] It is worth remembering that a policy, n, is a mapping of each state, if S, and an action ae A (or ae A (s)), to the probability n (s, a) of taking an action when in a state s. Given these definitions, the π policy is the F function in Equation 4.1. Informally, the value of a state in a policy n, called Vn (s), is the expected return when starting to follow in π thereafter. For example, we can formally define Vn (s) as ^ (s) = E „{R _t Vs _t = s} = Ε _π {Σ ™ = ο Y ^kr t + k + i Vs _t = s] (6.1) in that En {] calls the expected value since the agent follows the policy π, γ is a weight function and t is any time interval. Note that the terminal state value, if any, is usually zero. Function Vn is the state value function for policy n.
[0098] Similarly, we define the value of taking action a in a state s under a policy n, called Qn (s, a), as the expected return starting with s, taking action a and then following policy n:
Q ⁿ (s, a) = En {RtV st = s, at = a} = EJT {^ k = oY ^kr t + k + i st = s | a _t = a} (6.2) where En {] denominates the expected value since the agent follows the policy π, γ is a weight function and t is any interval
Petition 870190099635, of 10/04/2019, p. 48/85
44/69 time. Note that the terminal state value, if any, is usually zero. The Qn function can be called an action value function for policy n.
[0099] The Vn and Qn value functions can be estimated from experience. For example, if an agent follows the π policy and maintains an average, for each state found, of the actual returns that followed that state, then the average will converge to the state value, Vn (s), according to the number of times that the state is found approaching infinity. If separate averages are maintained for each action taken in a state, then those averages will converge in a similar way to the action values, Qn (s, a). We call estimation methods of this type Monte Carlo (MC) methods because they involve the average of many random samples of real returns. In some cases, there are many states and it may not be practical to maintain separate averages for each state individually. Instead, the agent can maintain Vn and Qn as parameterized functions and adjust the parameters to better match the observed returns. This can also produce accurate estimates, although it depends a lot on the nature of the approximator of the parameterized function.
[0100] A property of the state value and action value functions used in reinforcement learning and dynamic programming is that they satisfy specific recursive relationships. For any policy n and any state s, the following consistency condition remains between the value of s and the value of its possible successor states:
V ⁿ (s) = E _n {R _t V s _t = s] (6.3) ^ {Z “= o Y ^k r _{t +} k + i Vs _t = s] (6.4)
E _n {r _{t + 1} + γ Σ “ _{= ο} Y ^k r _{t + k + 2} V s _t = s] (6.5)
Petition 870190099635, of 10/04/2019, p. 49/85
45/69
Σα n ( ^s > a) Συ P “ _s ' K _s ' + yν ^π (s')] (6 · 6) where P is a set of transition probabilities between subsequent states based on the actions to be taken from the set A (s), R represents the immediate rewards expected from actions to be taken from set A (s) and the subsequent states s 'are taken from set S or set S' in the case of an episodic problem. This equation is Bellman's equation for Vn. The Bellman equation expresses a relationship between the value of a state and the values of its successor states. In simpler terms, this equation is a way of visualizing the transition from one state to its possible successor states. Of each of these, the environment can respond with one of several subsequent states' with a reward r. Bellman's equation averages all possibilities, weighing each one by its probability of occurrence. The equation states that the value of the initial state is equivalent to the value (discounted) of the next expected state, plus the expected reward along the way. The Vn value function is the only solution to your Bellman equation. These operations transfer value information back to a state (or action-state pair) from its successor states (or action-state pairs).
VI.B Policy Iteration [0101] Continuing with the methods used in reinforcement learning systems, the description turns to the policy iteration. When a policy, n, was improved using Vn to yield a better policy, π ', the system can then compute Vn' and further improve it to obtain an even better policy n ''. The system then determines a sequence of policies in monotonic improvement and
Petition 870190099635, of 10/04/2019, p. 50/85
46/69 velor:
π ₀ Εν ^π ° In ^ EV ⁷¹¹ Ιπ ₂ Ε ... Ιπ EV (6.7) where Ε denotes a policy assessment and I denotes a policy improvement. Each policy is generally an improvement over the previous policy (unless it is already ideal). In reinforcement learning models that have only a finite number of policies, this process can converge to an ideal policy and an ideal value function in a finite number of iterations.
[0102] This way of finding an ideal policy is called a policy iteration. An example model for policy iteration is provided in Figure 5A. Note that each policy assessment, itself an iterative computation, begins with the value function (state or action) of the previous policy. This usually results in an increase in the speed of convergence of policy evaluation.
SAW. C Iteration of Values [0103] Continuing with the methods used in reinforcement learning systems, the description becomes the iteration of value. Value iteration is a special case of policy iteration in which policy evaluation is interrupted after just one scan (one backup for each state). It can be recorded as a particularly simple backup operation that combines the steps of policy improvement and truncated policy assessment:
Vfc ₊₁ (s) = max _a E _n {r _{t + 1} + yVfc (s _{t + 1} ) s _t = sa _t = a} (6.8) max _a Σ _α n (s, a) Σε 'P “ _s 'Ku + YV ⁿ (s')] (6.9) for all s and S, where max _a selects the highest value function. For arbitrary VO, the sequence {Vk} can converge to V * under the same conditions that guarantee
Petition 870190099635, of 10/04/2019, p. 51/85
47/69 existence of V *.
[0104] Another way to understand the value iteration is by reference to the Bellman equation (described earlier). Note that the value iteration is achieved by simply transforming the Bellman equation into an update rule for a reinforcement learning model. Also, notice how the backup of the value iteration is similar to the policy evaluation backup, except that the maximum is performed on all actions. Another way to see this close relationship is to compare the backup diagrams for these models. Both are natural backup operations for computing Vn and V *.
[0105] Similar to policy evaluation, the value iteration formally uses an infinite number of iterations to converge exactly to V *. In practice, the value iteration ends when the value function changes only a small amount in an incremental step. Figure 5B provides an example of a value iteration model with this type of end condition.
[0106] The value iteration effectively combines, in each of its scans, a policy evaluation scan and a policy improvement scan. The fastest convergence is often achieved by interposing multiple policy assessment scans between each policy improvement scan. In general, the entire class of truncated policy iteration models can be viewed as sequences of scans, some of which use policy assessment backups and others use value iteration backups. As the max operation is the only difference between these backups, this indicates that the max operation was added to
Petition 870190099635, of 10/04/2019, p. 52/85
48/69 some policy assessment scans.
VI.D Time Difference Learning [0107] Both time difference (TD) and MC methods use experience to solve the prediction problem. Given some experience following policy n, both methods update their estimate V from V *. If a non-terminal state st is accessed at time t, then both methods update their estimate V (st) based on what happens after that visit. Generally, Monte Carlo methods wait until the return after the visit is known, and then use the return as a goal for V (s _t ). A simple MC method for all visits suitable for non-stationary environments is
7 (s _t ) e- 7 (s _t ) + a [R _t - 7 (s _t )] (6.11) where R _t is the actual return after the moment t and α is a constant step size parameter. Generally, MC methods wait until the end of the episode to determine the increment to V (s _t ) and only then R _t is known, whereas TD methods only need to wait for the next time interval. At time t + 1, the TD methods immediately form a goal and update using the observed reward rt + 1 and the estimate V (s _t + i) · The simplest TD method, known as TD (t = 0), is
E (s _t ) <- 7 (s _t ) + a [r _{t + 1} + y7 (s _{t + 1} ) -7 (s _t )] (6.12) [0108] Indeed, the goal for the Monte Carlo update is R _t , while the target for the TD update is r _{t + i} + y7 (s _{t + 1} ) (6.13) [0109] Since the TD method bases its update on an existing estimate, we say that this is a method bootstrap. From what has been said, = (6.14)
Petition 870190099635, of 10/04/2019, p. 53/85
49/69
GtK + i + yZ “= ₀ y ^k r _{t + k + 2} Vs _t = s} (6.15) [0110] In general, Monte Carlo methods use an estimate of 6.14 as a goal, while other methods use an estimate of 6.15 as goal. The MC target is an estimate because the expected value at 6.14 is not known; a sample return is used in place of an expected real return. The other goal of the method is an estimate, not due to the expected values, which are assumed to be provided entirely by an environment model, but because Vn (s _t + i) is not known and the current estimate, Vt (s _t + i) is used instead. The TD target is an estimate for both reasons: it takes a sample of the expected values in 6.15 and uses the current estimate V _t in place of the true one, V _n . Thus, TD methods combine MC sampling with the bootstrapping of other reinforcement learning methods.
[0111] We refer to the TD and Monte Carlo updates as sample backups, because they involve anticipating a sample successor state (or action and state pair), using the successor value and reward on the way to calculate a backup value and change the value of the original state (or action and state pair) in the same way. Sample backups differ from full backups of DP methods in that they are based on a single sample successor and not on a complete distribution of all possible successors. An example model for calculating temporal differences is given procedurally in Figure 5C.
VI.E Q Learning [0112] Another method used in reinforcement learning systems is a policyless TD control model known as Q learning. Its simplest form, Q learning of
Petition 870190099635, of 10/04/2019, p. 54/85
50/69 one step, is defined by
Q (s _t , a _t ) <- Q (s _t , a _t ) + a [r _{t + 1} + Ymax _to Q (s _{t + 1} o) - Q (s _t , a _t )] (6.16) [ 0113] In this case, the learned action-value function Q directly approaches Q *, the ideal action-value function, regardless of the policy being followed. This simplifies the analysis of the model and allows for early convergence evidence. The policy still has an effect, as it determines which state action pairs are visited and updated. However, all that is required for correct convergence is that all pairs continue to be updated. This is a minimum requirement, in the sense that any method that must find the ideal behavior in the general case uses it. With this assumption, a variant of the usual stochastic approximation conditions in the sequence of step size parameters has been shown to converge with probability 1 for Q *. The Q learning model is shown in the procedural form in Figure 5D.
VI.F Prediction of Values [0114] Other methods used in reinforcement learning systems use value prediction. Generally, the methods discussed are trying to predict that an action taken in the environment will increase the reward in the agent-environment system. Viewing each backup (ie, previous state or action-state pair) as an example of conventional training in this way allows us to use any of several existing function approximation methods for value prediction. In reinforcement learning, it is important that learning can take place online, while interacting with the environment or with a model (for example, a dynamic model) of the environment. Doing this involves methods capable of learning
Petition 870190099635, of 10/04/2019, p. 55/85
51/69 efficiently from data acquired incrementally. In addition, reinforcement learning generally uses function approximation methods capable of dealing with non-stationary target functions (target functions that change over time). Even if the policy remains the same, the target values of the training examples will not be stationary if they are generated by bootstrapping (TD) methods. Methods that cannot easily handle such non-stationary items are less suitable for reinforcement learning.
VI.G Actor-Critical Training [0115] Another example of a reinforcement learning method is the actor-critical method. The actor-critical method can use time difference methods or direct policy research methods to determine a policy for the agent. The actor-critical method includes an agent with an actor and a critic. The actor inserts determined state information about the environment and weight functions for the policy and generates an action. The critic inserts state information about the environment and a reward determined from the states and generates the weight functions for the actor. The actor and the critic work together to develop a policy for the agent that maximizes rewards for actions. Figure 5E illustrates an example of an agent-environment interface for an agent, including an actor and a critic.
VI.H Additional Information [0116] More descriptions of various elements of reinforcement learning can be found in the publications, Playing Atari with Deep Reinforcement Learning by Mnih et. al., Continuous Control with Deep Reinforcement Learning by
Petition 870190099635, of 10/04/2019, p. 56/85
52/69
Lillicrap et. al., and Asynchronous Methods for Deep Reinforcement Learning by Mnih et. al, all of which are incorporated herein in full for reference purposes.
VII.Neural Networks and Reinforcement Learning [0117] The model 342 described in Section V and Section VI can also be implemented using an artificial neural network (ANN). That is, agent 340 executes a model 342 that is an RNA. Model 342, including an ANN, determines outbound action vectors (machine commands) for machine 100 using input state vectors (measurements). The ANN has been trained so that certain actions of elements of the output action vectors improve the performance of the machine 100.
[0118] Figure 6 is an illustration of an RNA 600 of model 342, according to an example of modality. The RNA 600 is based on a large collection of simple neural units 610. A neural unit 610 can be an action a, a state s or any function that relates the actions to and states s for machine 100. Each neural unit 610 is connected with many others, and 620 connections can enhance or inhibit adjacent neural units. Each individual neural unit 610 can calculate using a sum function based on all input connections 620. There may be a threshold function or limitation function on each connection 620 and on each neural unit 610 itself, so that the signal neural units must exceed the limit before it spreads to other neurons. These systems are self-teaching and trained (using methods described in Section VI), rather than explicitly programmed. Here, the objective of RNA is to improve the performance of the machine 100, providing results to perform actions to interact with an environment,
Petition 870190099635, of 10/04/2019, p. 57/85
53/69 learning from these actions and using the information learned to influence actions towards a future goal. In one embodiment, the learning process for training ANN is similar to the policies and policy iteration described above. For example, in one embodiment, a machine 100 makes a first pass through a field to harvest a crop. Based on machine state measurements, Agent 340 determines a reward that is used to train Agent 340. With each pass through the field, Agent 340 trains continuously using a policy iteration reinforcement learning model to improve machine performance.
[0119] The neural network of Figure 6 includes two layers 630: an input layer 630A and an output layer 630B. Input layer 630A has input neural units 610A that send data through connections 620 to output neural units 610B of output layer 630B. In other configurations, an RNA can include additional hidden layers between the input layer 630A and the output layer 630B. Hidden layers can have neural units 610 connected to input layer 610A, output layer 610B or other hidden layers, depending on the configuration of the RNA. Each layer can have any number of neural units 610 and can be connected to any number of neural units 610 in an adjacent layer 630. The connections 620 between neural layers can represent and store parameters, referred to here as weights, that affect selection and propagation data from neural units of specific layers 610 to neural units of adjacent layers 610. Reinforcement learning trains the various connections 620 and weights so that the
Petition 870190099635, of 10/04/2019, p. 58/85
54/69 RNA 600 output generated from the RNA 600 input improves the performance of the machine 100. Finally, each neural unit 610 can be governed by an activation function that converts a weighted input of neural units into its output activation ( that is, activate a neural unit in a given layer). Some examples of activation functions that can be used are: softmax, identify, binary step, logistics, tanH, Arc Tan, softsign, rectified linear unit, parametric rectified linear, double identity, sing, Gaussian or any other activation function for networks neural.
[0120] Mathematically, the function of an ANN (F (s), as introduced above) is defined as a composition of other subfunctions gi (x), which can still be defined as a composition of other subfunctions. The ANN function is a representation of the structure of the interconnected neural units and this function can work to increase the performance of the agent in the environment. The role can generally provide a smooth transition for the agent towards improved performance as the input state vectors change and the agent takes action.
[0121] Generally, RNA 600 can use the 610A input neural units and generate an output through the 610B output neural units. In some configurations, the input layer 610A neural units can be connected to an input state vector 640 (for example, s). The input state vector 640 can include any information about the agent's current or previous states, actions and rewards in the environment (state elements 642). Each state element 642 of the input state vector 640
Petition 870190099635, of 10/04/2019, p. 59/85
55/69 can be connected to any number of 610A input neural units. Input state vector 640 can be connected to input neural units 610A, so that RNA 600 can generate an output on output neural units 610B in output layer 630A. The output neural units 610B can represent and influence the actions performed by the agent 340 running the model 442. In some configurations, the output neural units 610B can be connected to any number of action elements 652 of an output action vector ( for example, a). Each action element can represent an action that the agent can take to improve the performance of the machine 100. In another configuration, the 610B output neural units themselves are elements of an output action vector.
SAW. Agent Training using Two RNAs [0122] In one embodiment, similar to Figure 5E, agent 340 can run a model 342 using a trained RNA using an actor-critical training method (as described in Section VI). The actor and the critic are two RNAs configured in a similar way, because the input neural units, output neural units, input layers, output layers and connections are similar when the RNAs are initialized. At each iteration of the training, the actor's ANN receives an input state vector as input and, along with the weight functions (for example, γ, as described above) that make up the actor's ANN (as they exist in that moment), produces an output action vector. The weight functions define the weights for the connections that connect the neural units of the ANN. The agent performs an action in the environment that can affect the state and the agent measures the
Petition 870190099635, of 10/04/2019, p. 60/85
56/69 state. The critical RNA receives an input state vector and a reward state vector as input and, together with the weight functions that make up the critical RNA, produces weight functions to be provided to the actor RNA. The reward state vector is used to modify the weighted connections in the critical RNA, so that the weight functions issued to the actor RNA improve the performance of the machine. This process continues for each time interval, with the critical RNA receiving rewards and states as input and providing weights to the actor RNA as exits, and the RNA actor receiving weights and rewards as inputs and providing an action to the agent as output.
[0123] The actor-critical pair of RNAs work together to determine a policy that generates outbound action vectors that represent actions that improve the combined performance of input state vectors measured in the environment. After training, the actor-critical pair is said to have determined a policy, the critical RNA is discarded and the actor RNA is used as a 342 model for agent 340.
[0124] In this example, the reward data vector can include elements with each element representing a measure of a combination performance metric after an action has been performed. Performance metrics can include, in one example, an amount of harvested grain, a threshing quality, the cleanliness of harvested grain, the productivity of harvesters and a loss of grain. Performance metrics can be determined from any of the measurements received from sensors 330. Each element of the reward data vector is associated with a weight that defines a priority for each performance metric, from
Petition 870190099635, of 10/04/2019, p. 61/85
57/69 so that certain performance metrics can be prioritized over other performance metrics. In one embodiment, the reward vector is a linear combination of the different metrics. In some examples, the combine operator can determine the weights for each performance metric by interacting with the control system interface 350. For example, the operator can insert that grain cleaning is prioritized over the threshing quality and de-prioritized in regarding the amount of grain harvested. The critical RNA determines a weight function including a number of modified weights for the connections in the actor RNA based on the input state vector and the reward data vector. [0125] ANN training can be carried out using real data obtained from machines operating on a plantation. Thus, in one configuration, the actor-critical method ANNs can be trained using a set of input state vectors of any number of combinations, performing any number of actions based on output action vectors when harvesting plants in the field. Input state vectors and output action vectors can be accessed from the memory of control systems 130 of various combinations.
[0126] However, training RNAs can require a large amount of data that is difficult to obtain cheaply from machines operating in a field. Thus, in another configuration, the ANNs of the actor-critical method can be trained using a set of input state vectors and simulated output action vectors. The simulated vectors can be generated from a set of seed input state vectors and seed output action vectors obtained from harvesters. In this example, in some
Petition 870190099635, of 10/04/2019, p. 62/85
58/69 configurations, the simulated input state and output action vectors can originate from an ANN configured to generate actions that improve the performance of the machine.
VI11 Agent for a Harvester [0127] This section describes an agent 340 running a model 342 to improve the performance of a combine 200. In this example, model 342 is a reinforcement learning model implemented using an artificial neural network similar to the RNA of the Figure 6. That is, the ANN includes an input layer including a number of input neural units and an output layer including a number of output neural units. Each input neural unit is connected to any number of output neural units by any number of weighted connections. Agent 340 inserts measurements from combine 200 into neural input units and the model generates actions for combine 200 into neural output units. Agent 340 determines a set of machine commands based on the neural output units that represent actions for the combine that improve the combine's performance. Figure 7 is a method 700 for generating actions that improve harvester performance using an agent 340 that runs a model 342 including an artificial neural network trained using an actor-critical method. Method 700 can include any number of steps, too many or too few, or the steps can be performed in a different order.
[0128] First, the agent determines 710 an input state vector for model 342. The elements of the input state vector can be determined from any number of measurements received from sensors 330 via network 310.
Petition 870190099635, of 10/04/2019, p. 63/85
59/69
Each measurement is a measure of a state of the machine 100.
[0129] The agent then inserts 720 the input state vector into model 342. Each element of the input vector is connected to any number of input neural units. The model 342 represents a function configured to generate actions to improve the performance of the combine 200 from the input state vector. Therefore, the model 342 generates an output in the neural output units planned to improve the combine's performance. In an exemplary embodiment, the output neural units are connected to the elements of an output action vector and each output neural unit can be connected to any element of the output action vector. Each element of the exit action vector is an action that can be performed by a component 120 of the combine 200. In some examples, agent 340 determines a set of machine commands for components 120 based on the elements of the exit action vector.
[0130] Then agent 340 sends machine commands to input controllers 330 for its components 120 and input controllers 330 trigger components 730 based on the machine commands in response. Drive 730 of components 120 performs the action determined by model 342. In addition, drive 730 of components 120 changes the state of the environment and sensors 330 measure the change in state.
[0131] Agent 340 again determines 710 an input state vector to be inserted 720 in the model and determines an exit action and associated machine commands that drive 730 components of the harvester as the harvester traverses the field and harvests plants. With time,
Petition 870190099635, of 10/04/2019, p. 64/85
60/69 Agent 340 works to increase the performance of combine 200 when harvesting plants.
[0132] Table 1 describes several states that can be included in an input data vector. Table 1 also includes each measure associated with the m states, the sensor (s) 330 that generate the m measure and a description of the measure. The input data vector can additionally or alternatively include any other states determined from measurements generated from combine 200 sensors. For example, in some configurations, the input state vector may include states determined previously from measurements previous m. In this case, the states (or measurements) previously determined can be stored in the memory systems of the control system 130. In another example, the input state vector may include changes between the current state and a previous state.
Petition 870190099635, of 10/04/2019, p. 65/85
61/69
Table 1: States included in an input vector
States) Measurement (m) Sensor description Tailing Level O. 0 Reject366 Amount of usable grains in relation to the total MOG material. Separator Loss # Loss of Separator 219 Number of grain elements in contact with the separator loss sensor Shoe Loss # Loss ofShoe 221/223 Number of grains coming into contact with shoe loss sensors Threshing Loss O. 0 Threshing Load 368 Number of grain elements in contact with the threshing load sensor Damage to Grains O. 0 Grain Quality 370 Amount of damaged grains in relation to the amount of usable grains MOG-L O. 0 Grain Quality 370 Quantity of light MOG in relation to the amount of usable grains MOG-H O. 0 Grain Quality 370 Amount of heavy MOG in relation to the amount of usable grains Non-threshed material O. 0 Grain Quality 370 Quantity of material not threshed in relation to the quantity of usable grains
[0133] Table 2 describes several actions that can be included in an output action vector. Table 2 also includes the machine controller that receives machine commands based on the actions included in the output action vector, a high-level description of how each input controller 320 drives its respective components 120 and the actuation change units .
Petition 870190099635, of 10/04/2019, p. 66/85
62/69
Table 2: States included in an input vector.
Action (a) Controller description Units Vehicle Speed Vehicle388 Changes the speed of the combine using the engine mph Rotor Speed Rotor384 Changes the rotation speed of the rotor using the motor rpm Threshing Clearance Threshing Clearance 390 Changes the separation between the rotor and the threshing basket mm Fan Angle Threshing Clearance 390 Changes the angle of the threshing fan in relation to the crop entering the harvester deg Top Sieve Opening Top Sieve 380 Changes the separation of the sieve to the upper sieve mm Lower Sieve Opening 382 Top Sieve Changes the separation of the sieve to the lower sieve mm Fan Speed Fan386 Changes the fan speed rpm Head Height Head392 Changes the height of the head in relation to the ground mm
[0134] In one example, agent 340 is running a model
442 that is not being actively trained using the reinforcement techniques described in Section VI. In this case, the agent can be a model who has been trained independently using the actor-critical methods described in Section VII.A. That is, the agent is not actively rewarding connections on the neural network. The agent can also include several models that have been trained to optimize different performance metrics for the combine. The combine user can select between performance metrics to optimize and thus change models using the 130 control system interface.
[0135] In other examples, the agent can actively train
Petition 870190099635, of 10/04/2019, p. 67/85
63/69 model 442 using reinforcement techniques. In this case, the model 342 generates a reward vector including a weight function that modifies the weights of any of the connections included in the model 342. The reward vector can be configured to reward various metrics, including the performance of the combine as a whole. , reward a state, reward a change of state, etc. In some instances, the combine user can select which metrics to reward using the 130 control system interface. IX. Control System [0136] Figure 8 is a block diagram illustrating components of an example machine for reading and executing instructions in a machine-readable medium. Specifically, Figure 8 shows a schematic representation of network system 300 and control system 310 in the exemplary form of a computer system 800. Computer system 800 can be used to execute instructions 824 (for example, program code or software) to make the machine execute any one or more of the methodologies (or processes) described here. In alternative modes, the machine operates as an independent device or a connected device (for example, in a network) that connects to other machines. In a network deployment, the machine can operate as a server machine or a client machine in a server-client network environment or as a peer-to-peer machine in a peer-to-peer (or distributed) network environment.
[0137] The machine can be a server computer, a client computer, a personal computer (PC), a tablet, a set-top box (STB), a smartphone, an Internet of Things (loT) device, an Internet router. network, switch or bridge or
Petition 870190099635, of 10/04/2019, p. 68/85
64/69 any machine capable of executing the 824 instructions (sequential or not) that specify actions to be taken by that machine. In addition, although only a single machine is illustrated, the term machine should also be considered to include any collection of machines that execute, individually or together, the 824 instructions to execute any one or more of the methodologies discussed here.
[0138] The example computer system 800 includes one or more processing units (usually an 802 processor). The 802 processor is, for example, a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), a controller, a state machine, one or more integrated circuits for applications (ASICs), one or more radio frequency integrated circuits (RFICs) or any combination of these. The computer system 800 also includes a main memory 804. The computer system can include a storage unit 816. The processor 802, memory 804 and the storage unit 816 communicate over a bus 808.
[0139] In addition, computer system 800 may include a static memory 806, a graphic display 810 (for example, to drive a plasma display panel (PDP), a liquid crystal display (LCD) or a projector) . Computer system 800 may also include an alphanumeric input device 812 (for example, a keyboard), a cursor control device 814 (for example, a mouse, trackball, joystick, motion sensor, or other pointing instrument ), an 818 signal generation device (for
Petition 870190099635, of 10/04/2019, p. 69/85
65/69 (example, a speaker) and a network interface device 820, which are also configured to communicate over the 808 bus.
[0140] The storage unit 816 includes a machine-readable medium 822 in which are stored the instructions 824 (for example, software) that incorporate any one or more of the methodologies or functions described here. For example, instructions 824 may include the functionality of system modules 130 described in FIG. 2. Instructions 824 may also reside, wholly or at least partially, in main memory 804 or in processor 802 (for example, in cache memory of processor) during execution by computer system 800, with main memory 804 and processor 802 also constituting machine-readable media. Instructions 824 can be transmitted or received on an 826 network through the 820 network interface device.
X. Additional Considerations [0141] In the description above, for the sake of explanation, several specific details are set out in order to provide a complete understanding of the illustrated system and its operations. It will be apparent to one skilled in the art, however, that the system can be operated without these specific details. In other cases, structures and devices are shown in block diagram format so as not to obscure the system.
[0142] References to a modality or an embodiment mean that a specific feature, structure or feature described in connection with the modality is included in at least one modality of the system. The appearances of the phrase in an embodiment
Petition 870190099635, of 10/04/2019, p. 70/85
6/69 in several parts of the specification are not necessarily all referring to the same modality.
[0143] Some parts of the detailed descriptions are presented in terms of algorithms or models and symbolic representations of operations in data bits in a computer's memory. An algorithm is and, in general, is conceived as steps that lead to the desired result. The steps are those that require physical transformation or manipulation of physical quantities. Generally, although not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared and manipulated. At times, it has proved convenient, mainly for reasons of common use, to refer to these signs as bits, values, elements, symbols, characters, terms, numbers or similar.
[0144] However, it should be borne in mind that all these terms and the like must be associated with the appropriate physical quantities and are merely convenient labels applied to those quantities. Unless otherwise specified as apparent in the following discussion, it should be appreciated that, throughout the description, discussions using terms such as processing or computation or calculation or determination or display or the like, refer to the actions and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities in the computer system's records and memories into other data equally represented as physical quantities in the system's memories or records
Petition 870190099635, of 10/04/2019, p. 71/85
67/69 computer or other information storage, transmission or display devices.
[0145] Some of the operations described here are performed by a computer physically mounted inside a machine 100. This computer can be specially built for the required purposes or it can comprise a general purpose computer selectively activated or reconfigured by a program stored on the computer. This computer program can be stored on computer-readable storage media, such as, without limitation, any type of disc, including floppy disks, optical discs, CD-ROMs and magnetic optical discs, read-only memories (ROMs)), memories random access (RAMs), EPROMs, EEPROMs, magnetic or optical cards or any type of non-transitory computer-readable storage medium, suitable for storing electronic instructions.
[0146] The figures and description above refer to various modalities for illustrative purposes only. It should be noted that, from the discussion below, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that can be employed without departing from the principles of what is claimed.
[0147] One or more embodiments have been described above, examples of which are illustrated in the attached figures. Note that, whenever possible, similar reference numbers can be used in the figures and may indicate similar functionality. The figures represent modalities of the system (or method) disclosed for illustrative purposes only. One skilled in the art will recognize
Petition 870190099635, of 10/04/2019, p. 72/85
68/69 promptly from the description below which alternative modalities of the structures and methods illustrated in this document can be used without departing from the principles described here.
[0148] Some modalities can be described using the term coupled and connected together with their derivatives. It should be understood that these terms are not intended to be synonymous with each other. For example, some modalities can be described using the term connected to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some modalities can be described using the coupled term to indicate that two or more elements are in direct physical or electrical contact. The coupled term, however, can also mean that two or more elements are not in direct physical or electrical contact, but still cooperate or interact with each other. The modalities are not limited in this context.
[0149] As used in this document, the terms comprise, comprising, includes, including, has, owns or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article or device that comprises a list of elements is not necessarily limited to just those elements, but may include other elements not expressly listed or inherent in that process, method, article or device. In addition, unless expressly stated to the contrary, either refers to one or even, and not to one or exclusive. For example, a condition A or B is met by any of the following: A is true (or present) and B is false (or is not present), A is false (or is not
Petition 870190099635, of 10/04/2019, p. 73/85
9/69 present) and B is true (or present), and A and B are true (or present).
[0150] In addition, the use of indefinite articles is employed to describe elements and components of the embodiments of this invention. This is done only for convenience and to give a general sense of the system. This description should be read to include one or at least one and the singular also includes the plural, unless it is obvious that the opposite is to be understood.
[0151] After reading this disclosure, experts in the art will still appreciate additional alternative structural and functional designs for a system and a process to detect possible malware using behavioral scanning analysis using the principles disclosed here. Thus, although particular modalities and applications have been illustrated and described, it should be understood that the modalities disclosed are not limited to the construction and the exact components disclosed herein. Various modifications, alterations and variations, which will be evident to those skilled in the art, can be made in the arrangement, operation and details of the method and apparatus disclosed herein, without departing from the spirit and scope defined in the attached claims.

权利要求:
Claims (18)
[1]
1. Method to control the mechanisms of action of a plurality of components of a harvester, to harvest plants as the harvester crosses a plantation, characterized by the fact that the method comprises:
- determining a state vector comprising a plurality of state elements, with each of the state elements representing a measurement of the state of a subset of the combine components, with each of the components being communicatively controlled by an actuation controller attached to a combine-mounted computer;
- insert, using the computer, the state vector in a control model to generate an action vector that comprises a plurality of action elements for the harvester, with each of the action elements specifying an action to be taken by the harvester in the plantation field, and with the actions, collectively, foreseen to achieve an improved harvest performance for the harvester; and
- act a subset of actuation controllers to execute the actions in the plantation based on the action vector, with the subset of controllers modifying a configuration of the subset of components so that the state of the combine changes.
[2]
2. Method, according to claim 1, characterized by the fact that the control model comprises a function that represents the relationship between the state vector received as input to the control model and the action vector generated as output from the control model , and where the role is a trained model using reinforcement learning to reward
Petition 870190094299, of 9/20/2019, p. 10/16
2/6 actions that improve the harvesting performance of the combine.
[3]
3. Method, according to claim 1, characterized by the fact that the control model comprises an artificial neural network comprising:
a plurality of neural nodes that includes a set of input nodes to receive inputs on the artificial neural network and a set of output nodes for the output of the artificial neural network, where each neural node represents a subfunction to determine an output from the neural network artificial from the input of the artificial neural network, and each input node is connected to one or more output nodes by connecting a plurality of weighted connections; and a function configured to generate actions for the combine that improve its performance, with the function being defined by the subfunctions and weighted connections of the artificial neural network.
[4]
4. Method, according to claim 3, characterized by the fact that:
each state element of the state vector is connected to one or more input nodes by a connection of the plurality of weighted connections, each action element of the action vector is connected to one or more output nodes by a connection of the plurality of connections weighted, and the function is configured to generate action elements of the action vector from the state elements of the state vector.
[5]
5. Method, according to claim 3, characterized by the fact that the artificial neural network is a first network
Petition 870190094299, of 9/20/2019, p. 11/16
3/6 artificial neural network from a pair of artificially configured neural networks similarly acting as a critical pair of performance and used to train the first artificial neural network to generate actions that improve the harvester's performance.
[6]
6. Method, according to claim 5, characterized by the fact that:
the first neural network inserts state vectors and values for the weighted connections and produces action vectors, with the values of the weighted connections modifying the function to generate actions for the combine that improve its performance, and the second neural network inserts a reward vector and a state vector and produces the values for the weighted connections, with the reward vector comprising elements that signal the improvement of the harvester performance from an action previously performed.
[7]
7. Method according to claim 5, characterized in that the elements of the reward vector are determined using measurements of the capacities of a subset of combine components that were previously acted on based on the action performed previously.
[8]
8. Method, according to claim 5, characterized by the fact that the operator can select a metric for performance improvement, with the metric including any element between productivity, plant cleaning, quantity of plants harvested, quality of plants harvested, quality of threshed plants and amount of plant loss.
[9]
9. Method according to claim 5, characterized
Petition 870190094299, of 9/20/2019, p. 12/16
4/6 because the state vectors are obtained from the plurality of harvesters adopting a plurality of actions from a plurality of action vectors to harvest plants in the plantation.
[10]
10. Method, according to claim 5, characterized in that the state vectors and action vectors are simulated from a set of seed state vectors obtained from a plurality of harvesters receiving a set of actions from a seed set of action vectors to harvest plants in the plantation.
[11]
11. Method, according to claim 1, characterized by the fact that the determination of a state data vector comprises:
access to a data stream communicatively coupling a plurality of sensors, each sensor providing measurement of one of the capabilities of a subset of the combine components; and determining the elements of the state vector based on the measurements included in the data flow.
[12]
12. Method according to claim 11, characterized in that the plurality of sensors can include a threshing gap sensor, a tailings level sensor, a separator loss sensor, a shoe loss sensor , a grain damage sensor, a non-grain material sensor and a non-threshed grain sensor.
[13]
13. Method according to claim 1, characterized by the fact that state elements can include:
the level of tailings, representing a ratio between the usable plant and non-plant materials in the tailings of a combine cleaning shoe component;
Petition 870190094299, of 9/20/2019, p. 13/16
5/6 the loss of the separator, representing the amount of plant lost in the separator component of the combine;
shoe loss, representing the amount of plant lost in the shoe component of the combine;
the threshing loss representing the amount of plant lost in the threshing component of the combine;
damage to grains, representing the amount of plant damaged in the grain tank component of the combine; a non-plant light material representing a ratio of usable plant to non-plant light material in the grain tank component of the combine;
a non-plant heavy material representing a proportion of usable plant to non-plant heavy material in the grain tank component of the combine; and the non-chaffed plant representing a ratio between usable plant and non-chaffed plant in the grain tank component of the combine.
[14]
14. Method, according to claim 1, characterized by the fact that the performance of a subset of performance controllers comprises:
determine a set of machine instructions for each controller in the subset so that the machine instructions modify the configuration of each component when received by the actuation controller;
access a data flow by coupling the actuation controllers in a communicative way; and send the set of machine instructions to each actuator in the subset via the data flow.
[15]
15. Method according to claim 1, characterized
Petition 870190094299, of 9/20/2019, p. 14/16
6/6 because the action elements can specify actions that include:
modify the speed of the combine;
modify the speed of the rotor a component rotor gives harvester; modify a distance gap threshing in between one gap component of threshing and component rotor gives
harvester;
modify a fan angle between a rotor and the direction of the plant material entering the harvester;
modify an upper sieving aperture;
modify a lower sieving opening; and modifying the ventilation speed of a fan component of the combine.
[16]
16. Method according to claim 1, characterized
by the fact that plurality of components gives machine combine harvester can include a rotor, an engine, an basket of threshing, one head, a sieve higher, an sieve
bottom, a grain elevator, a grain tank, a
fan, one separating fan or a hot shoe. 17. Method, of according to claim 1, characterized by the fact that the combine components are
configured to harvest plants including corn, wheat or rice.
[17]
18. Method, according to claim 1, characterized by the fact that the action elements of the action vector are a numerical representation of the action.
[18]
19. Method, according to claim 1, characterized by the fact that the state elements of the state vector are numerical representations of the measurements.

类似技术:

公开号 | 公开日 | 专利标题

BR112019019653A2|2020-04-22|method for controlling the mechanisms of actuation of a plurality of components of a combine

AU2019272876B2|2021-12-16|Boom sprayer including machine feedback control

Khairunniza-Bejo et al.2014|Application of artificial neural network in predicting crop yield: A review

Somov et al.2018|Pervasive agriculture: IoT-enabled greenhouse for plant growth control

EP2775827B1|2019-10-23|Pest control system, pest control method and pest control program

Simbahan et al.2004|Screening yield monitor data improves grain yield maps

RU2016115274A|2017-10-25|HARVEST SYSTEM WITH SELF-PROPELLED HARVESTING MACHINE

BR102020007830A2|2020-11-03|METHOD FOR MAPPING AN AGRICULTURAL CULTURE, AND, SYSTEM FOR MAPPING THE LOCATION OF CULTURE FAILURE PLANTS

AU2019272874B2|2020-10-29|Semantic segmentation to identify and treat plants in a field and verify the plant treatments

US20200410234A1|2020-12-31|Automatic camera parameter adjustment on a plant treatment system

Lameski et al.2018|Review of automated weed control approaches: An environmental impact perspective

US20210056338A1|2021-02-25|Plant Group Identification

US20210029878A1|2021-02-04|Predictivemap generation and control

Larbi et al.2014|Effects of orchard characteristics and operator performance on harvesting rate of a mechanical sweet cherry harvester

Berk et al.2019|Plant protection product dose rate estimation in apple orchards using a fuzzy logic system

Eggerl2017|Optimization of combine processes using expert knowledge and methods of artificial intelligence

WO2020218157A1|2020-10-29|Prediction system, prediction method, and prediction program

US11191215B1|2021-12-07|Dynamically operated concave threshing bar

Zazueta et al.2008|Basic concepts in environmental computer control of agricultural systems

BR112020022875B1|2021-12-28|SEMANTIC SEGMENTATION TO IDENTIFY AND TREAT PLANTS IN THE FIELD AND VERIFY PLANT TREATMENTS

US20210015045A1|2021-01-21|Federated harvester control

Zhang et al.2012|Decision Support System for Greenhouse Tomato Yield Prediction using Artificial Intelligence Techniques

Jyothi et al.2021|Applications of Statistical Machine Learning Algorithms in Agriculture Management Processes

ADEYEMO2017|LEVERAGING ICT TO

同族专利:

公开号 | 公开日

EP3582603A4|2021-01-06|

CN110740635A|2020-01-31|

EP3582603A1|2019-12-25|

WO2018175641A1|2018-09-27|

US20180271015A1|2018-09-27|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5448681A|1992-03-27|1995-09-05|National Semiconductor Corporation|Intelligent controller with neural network and reinforcement learning|

AU658066B2|1992-09-10|1995-03-30|Deere & Company|Neural network based control system|

US6553300B2|2001-07-16|2003-04-22|Deere & Company|Harvester with intelligent hybrid control system|

CN101715675A|2009-12-22|2010-06-02|江苏大学|Photoelectric type corn growing density online detection method and device thereof|

US9015093B1|2010-10-26|2015-04-21|Michael Lamport Commons|Intelligent control with hierarchical stacked neural networks|

US9629308B2|2011-03-11|2017-04-25|Intelligent Agricultural Solutions, Llc|Harvesting machine capable of automatic adjustment|

CA2852938A1|2011-10-21|2013-04-25|Pioneer Hi-Bred International, Inc.|Combine harvester and associated method for gathering grain|

DE102012220109A1|2012-11-05|2014-05-08|Deere & Company|Device for detecting the operating state of a work machine|

US9897429B2|2013-12-20|2018-02-20|Harvest Croo, Llc|Harvester suspension|

US20150195991A1|2014-01-15|2015-07-16|Cnh America Llc|Header height control system for an agricultural harvester|

US10426087B2|2014-04-11|2019-10-01|Deere & Company|User interface performance graph for operation of a mobile machine|

DE102014113008A1|2014-09-10|2016-03-10|Claas Selbstfahrende Erntemaschinen Gmbh|Method for operating a combine harvester|

US9630318B2|2014-10-02|2017-04-25|Brain Corporation|Feature detection apparatus and methods for training of robotic navigation|

US9779330B2|2014-12-26|2017-10-03|Deere & Company|Grain quality monitoring|

CN104737707B|2015-03-04|2017-03-01|江苏大学|A kind of combined harvester cleans percentage of impurity adaptive controller and adaptive cleaning method|

DE102015004343A1|2015-04-02|2016-10-06|Claas Selbstfahrende Erntemaschinen Gmbh|Harvester|

AU2016297852C1|2015-07-24|2019-12-05|Deepmind Technologies Limited|Continuous control with deep reinforcement learning|

US10028435B2|2016-03-04|2018-07-24|Deere & Company|Sensor calibration using field information|

US20190064791A1|2016-05-09|2019-02-28|Strong Force Iot Portfolio 2016, Llc|Methods and systems for detection in an industrial internet of things data collection environment with intelligent management of data selection in high data volume data streams|

DE202016104858U1|2016-09-02|2016-09-15|Claas Saulgau Gmbh|Control device for operating an agricultural transport vehicle and trolley|

US10699185B2|2017-01-26|2020-06-30|The Climate Corporation|Crop yield estimation using agronomic neural network|US11082720B2|2017-11-21|2021-08-03|Nvidia Corporation|Using residual video data resulting from a compression of original video data to improve a decompression of the original video data|

US10687466B2|2018-01-29|2020-06-23|Cnh Industrial America Llc|Predictive header height control system|

US20190357520A1|2018-05-24|2019-11-28|Blue River Technology Inc.|Boom sprayer including machine feedback control|

US11240961B2|2018-10-26|2022-02-08|Deere & Company|Controlling a harvesting machine based on a geo-spatial representation indicating where the harvesting machine is likely to reach capacity|

US11178818B2|2018-10-26|2021-11-23|Deere & Company|Harvesting machine control system with fill level processing based on yield data|

US11129331B2|2019-01-04|2021-09-28|Cnh Industrial America Llc|Steering control system for harvester and methods of using the same|

CN109885959B|2019-03-05|2019-09-27|中国科学院地理科学与资源研究所|A kind of surface temperature robust NO emissions reduction method|

US11079725B2|2019-04-10|2021-08-03|Deere & Company|Machine control using real-time model|

US11234366B2|2019-04-10|2022-02-01|Deere & Company|Image selection for machine control|

WO2021131317A1|2019-12-26|2021-07-01|株式会社クボタ|Threshing state management system, threshing state management method, threshing state management program, recording medium recording threshing state management program, harvester management system, harvester, harvester management method, harvester management program, recording medium recording harvester management program, work vehicle, work vehicle management method, work vehicle management system, work vehicle management program, recording medium recording work vehicle management program, management system, management method, management program, and recording medium recording management program|

US20210339809A1|2020-04-30|2021-11-04|Deere & Company|Implement recognition lighting|

CN111591893A|2020-05-27|2020-08-28|太原科技大学|Method for measuring hoisting load of automobile crane based on neural network|

CN112616425B|2021-03-08|2021-06-04|农业农村部南京农业机械化研究所|On-line detection method, system and device for operation performance of grain combine harvester|

法律状态:
2021-10-19| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

US201762474563P| true| 2017-03-21|2017-03-21|

US201762475118P| true| 2017-03-22|2017-03-22|

PCT/US2018/023638|WO2018175641A1|2017-03-21|2018-03-21|Combine harvester including machine feedback control|

[返回顶部]